Identifying query aspects

ABSTRACT

Methods, systems, and apparatus, including computer program products, for generating aspects associated with entities. In some implementations, a method includes receiving data identifying an entity; generating a group of candidate aspects for the entity; modifying the group of candidate aspects to generate a group of modified candidate aspects comprising combining similar candidate aspects and grouping candidate aspects using one or more aspect classes each associated with one or more candidate aspects; ranking one or more modified candidate aspects in the group of modified candidate aspects based on a diversity score and a popularity score; and storing an association between one or more highest ranked modified candidate aspects and the entity. The aspects can be used to organize and present search results in response to queries for the entity.

BACKGROUND

This specification relates to providing, in response to search queries,information identifying aspects of entities identified in the searchqueries, and using the aspects in presenting information in response tothe search queries.

Internet search engines provide information about Internet accessibleresources (e.g., Web pages, images, text documents, multimedia content)that are responsive to a user's search query and present informationabout the resources in a manner that is useful to the user. Internetsearch engines return a set of search results (e.g., as a ranked list ofresults) in response to a user submitted query. A search resultincludes, for example, a URL and a snippet of information from acorresponding resource. Conventional search engines are implementedunder an assumption that the user's search query can be satisfied by asingle result, and work to help the user find that result.Unfortunately, users are not always looking for a single result, but areinstead using the query as a starting point for exploration of anunknown space of information about something that they may initiallyrefer to in a generic way.

For example, a user may submit a query that names or refers to an entityas a starting point for exploring various aspects associated with thatentity. When used in reference to operations of an information retrievalsystem, e.g., a search engine, the term “entity” refers to text thatnames or identifies something. This something can be any object that canhave associated properties (e.g., an object in the physical, conceptualor mythical world). For example, an entity can refer a location, aperson, a fictional character, a state, a thing, an idea, and so on.When the meaning is clear from context, and to avoid unnecessaryverbiage, the term “entity” may also be used to refer to the thingitself.

Aspects are different axes of information along which additionalinformation about an entity can be obtained. For example, for an entity“Hawaii”, possible aspects can include “beaches,” “hotels,” and“weather.” As with the term “entity”, when used in reference tooperations of an information retrieval system, e.g., a search engine,the term “aspect” refers to text that names the aspect in question, andotherwise, when the meaning is clear from context, the term may also beused to refer to the aspect itself.

A single ranked list of results provided by conventional search enginestypically fail to provide users an overview of different aspects of theentity. Rather, the single ranked list often provides many resultsdirected to a single or a small number of aspects. Additionally, thepresented results typically do not identify the aspects represented.

SUMMARY

This specification describes technologies relating to identifyingaspects associated with entities.

In general, one aspect of the subject matter described in thisspecification can be embodied in methods that include the actions ofreceiving a query in a computer system, the computer system comprisingone or more computers, the query including an entity; generating in thecomputer system a group of candidate aspects for the entity; modifyingin the computer system the group of candidate aspects to generate agroup of modified candidate aspects comprising combining similarcandidate aspects and grouping candidate aspects using one or moreaspect classes each associated with one or more candidate aspects;ranking in the computer system one or more modified candidate aspects inthe group of modified candidate aspects based on a diversity score and apopularity score; associating in the computer system one or more highestranked modified candidate aspects with the entity; receiving in thecomputer system one or more sets of search results; and providing apresentation of the search results in response to the query, thepresentation presenting the search results organized according to theaspects associated with the entity. Other embodiments of this aspectinclude corresponding systems, apparatus, and computer programs,configured to perform the actions of the methods, encoded on computerstorage devices.

These and other embodiments can each optionally include one or more ofthe following features. The method can further include presenting asummary of information about an entity in accordance with an aspect. Theone or more sets of search results can include a set of search resultsresponsive to the query. Each of the one or more sets of search resultscan correspond to a respective aspect associated with the entity.

In general, another aspect of the subject matter described in thisspecification can be embodied in methods that include the actions ofreceiving data identifying an entity; generating in a computer system agroup of candidate aspects for the entity, the computer systemcomprising one or more computers; modifying in the computer system thegroup of candidate aspects to generate a group of modified candidateaspects, comprising combining similar candidate aspects and groupingcandidate aspects using one or more aspect classes each associated withone or more candidate aspects; ranking in the computer system one ormore modified candidate aspects in the group of modified candidateaspects based on a diversity score and a popularity score; and storingan association of one or more of the highest ranked modified candidatesaspects with the entity in a data storage device of the computer system.Other embodiments of this aspect include corresponding systems,apparatus, and computer programs, configured to perform the actions ofthe methods, encoded on computer storage devices.

These and other embodiments can each optionally include one or more ofthe following features. The method can further include receiving a queryincluding the entity; identifying one or more aspects associated withthe entity; receiving search results responsive to the query; andpresenting the search results based on the identified aspects. Themethod can further include receiving a query including the entity;identifying one or more aspects associated with the entity; receivingone or more sets of search results, each set corresponding to one of theidentified aspects; and presenting the search results based on theidentified aspects.

The method can further include receiving data identifying one or moreentity properties, where generating the group of candidate aspectsincludes using the one or more entity properties; and the one or morehighest ranked candidates aspects are associated with both the entityand the entity properties. The method can further include associatingthe entity with a class, the class having one or more class membersincluding the entity; and where generating the group of candidateaspects includes generating candidate aspects corresponding to theentity and the class. Generating the group of candidate aspects caninclude analyzing one or more first user search histories to identifyqueries associated with the entity; and analyzing one or more seconduser search histories to identify queries associated with a class memberother than the entity.

Combining candidate aspects can include calculating similarity scores,where each similarity score is an estimate of similarity between twocandidate aspects; and combining candidate aspects into a singlemodified candidate aspect based on the similarity scores. Each candidateaspect can be expressed as text and the similarity score between twocandidate aspects can be based on a comparison of the strings of textassociated with each candidate aspect. Calculating a similarity scorebetween two candidate aspects can include receiving a respective set ofsearch results for each aspect; and calculating the similarity scorebased on a comparison of the sets of search results. The comparison ofthe sets of search results can include a comparison of paths of thesearch results in one of the sets of search results to paths of thesearch results in the other one of the sets of search results. Thecomparison of the sets of search results can include a comparison oftitles and snippets of the search results in one of the sets of searchresults to titles and snippets of the search results in the other one ofthe sets of search results. Combining candidate aspects based on thesimilarity scores can further include using a graph partition algorithmto determine which aspects to combine.

Grouping candidate aspects using one or more aspect classes can includeassociating two or more candidate aspects with a respective aspectclass; and grouping two or more candidate aspects into a single modifiedcandidate aspect based on their aspect classes. The single modifiedcandidate aspect can be an aspect class.

Ranking one or more modified candidate aspects based on a diversityscore and a popularity score can include calculating a popularity scorefor each aspect; ranking the aspect with the highest popularity scorethe highest; and ranking the remaining aspects by repeating thefollowing steps one or more times: calculating a similarity score foreach un-ranked aspect, where the similarity score compares thesimilarity of the un-ranked aspect to the ranked aspects; and assigningthe next highest ranking to the aspect whose popularity score divided byits similarity score is the highest.

Particular embodiments of the subject matter described in thisspecification can be implemented so as to realize one or more of thefollowing advantages. Aspects of an entity in a search query can beidentified. Aspects can be presented to make it easy for users toexplore the search space along multiple axes. The use of aspects allowsa user to explore the search space beyond the scope of his or heroriginal query. The presentation of aspects also allows a user toquickly gain an overview of what the possible axes of search are. Thepresentation of aspects can allow a user to browse a search spaceefficiently, for example, by using faceted browsing. Information relatedto the aspects can be identified and presented to the user. Thisinformation can allow a user to quickly gain information he or she needsabout multiple aspects of the entity. Mashups can be presented to a useras a way of visualizing information about the aspects of the entity. Themashups present information associated with several aspects in a singleintegrated interface.

The details of one or more embodiments of the invention are set forth inthe accompanying drawings and the description below. Other features,aspects, and advantages of the invention will become apparent from thedescription, the drawings, and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an example search system for providing search resultsrelevant to submitted queries.

FIG. 2 illustrates an example method for associating aspects with anentity.

FIG. 3 illustrates an example of combining similar candidate aspects.

FIG. 4 illustrates an example of grouping aspects based on their aspectclasses.

FIG. 5 illustrates an example of ranking an unranked aspect, given apre-existing group of one or more ranked aspects.

FIG. 6 illustrates an example method for receiving a query including oneor more terms corresponding to an entity and presenting search resultsbased on the identified aspects of the entity.

FIG. 7 illustrates an example mashup displayed after a user submits asearch query.

FIG. 8 illustrates an example architecture of a system.

Like reference numbers and designations in the various drawings indicatelike elements.

DETAILED DESCRIPTION

FIG. 1 illustrates an example search system 114 for providing searchresults relevant to submitted queries as can be implemented in aninternet, an intranet, or another client and server environment. Thesearch system 114 is an example of an information retrieval system inwhich the systems, components, and techniques described below can beimplemented.

A user 102 can interact with the search system 114 through a clientdevice 104. For example, the client 104 can be a computer coupled to thesearch system 114 through a local area network (LAN) or wide areanetwork (WAN), e.g., the Internet. In some implementations, the searchsystem 114 and the client device 104 can be one machine. For example, auser can install a desktop search application on the client device 104.The client device 104 will generally include a random access memory(RAM) 106 and a processor 108.

A user 102 can submit a query 110 to a search engine 130 within a searchsystem 114. When the user 102 submits a query 110, the query 110 istransmitted through a network to the search system 114. The searchsystem 114 can be implemented as, for example, computer programs runningon one or more computers in one or more locations that are coupled toeach other through a network. The search system 114 includes an indexdatabase 122 and a search engine 130. The search system 114 responds tothe query 110 by generating search results 128, which are transmittedthrough the network to the client device 104 in a form that can bepresented to the user 102 (e.g., a search results web page to bedisplayed in a web browser running on the client device 104).

When the query 110 is received by the search engine 130, the searchengine 130 identifies resources that match the query 110. The searchengine 130 may also identify a particular “snippet” or section of eachresource that is relevant to the query. The search engine 130 willgenerally include an indexing engine 120 that indexes resources (e.g.,web pages, images, or news articles on the Internet) found in a corpus(e.g., a collection or repository of content), an index database 122that stores the index information, and a ranking engine 152 (or othersoftware) to rank the resources that match the query 110. The indexingand ranking of the resources can be performed using conventionaltechniques. The search engine 130 can transmit the search results 128through the network to the client device 104, for example, forpresentation to the user 102.

The search system 114 may also maintain one or more user searchhistories based on the queries it receives from a user. Generallyspeaking, a user search history stores a sequence of queries receivedfrom a user. User search histories may also include additionalinformation such as which results were selected after a search wasperformed and how long each selected result was viewed.

In some implementations, the search system 114 includes an aspector 140.Alternatively, the aspector 140 can be implemented in one or moredistinct systems coupled to the search system 114. The aspector 140associates aspects with particular entities. Additionally, the aspector140 can receive the query 110 and, in conjunction with the search engine130, provide aspect based search results to the user 102. Identifyingand using aspects will be described in greater detail below.

FIG. 2 illustrates an example method 200 for associating aspects with anentity. For convenience, the example method 200 will be described inreference to a system that performs the method 200. The system can be,for example, the search system 114, or a separate system.

The system receives an entity (step 202). An entity can be any objectthat can have associated properties (e.g., an object in the physical orconceptual world). For example, an entity can be a location, a person, athing, an idea, etc. The system can receive the entities from a varietyof sources. For example, the system can receive an entity directly froma user or in response to actions performed by the system (e.g., theaction of executing a process). An entity can be extracted from a searchquery received from a user or the search system 114, for example, byparsing the query and comparing the terms of the query to a database ofpossible entities. Other sources of an entity are also possible, forexample, an entity can be extracted from query data, such as user searchhistories.

In some implementations, the system also receives data identifying oneor more properties of the entity. Properties of entities are additionalelements associated with an entity that can be used to further refinethe entity. For example, “travel” can be a property of the entity“Vietnam” because people travel to Vietnam.

The system generates a group of candidate aspects for the entity (step204). The candidate aspects can be generated based on the entity, oralternatively, based on a class associated with the entity. The class isan abstraction of the entity. For example, “chocolate cake” could beassociated with the class “food,” because chocolate cake is a type offood. “Daffodil” could be associated with the class “flower,” because adaffodil is a type of flower. The class can have multiple members. Eachmember is also an entity. For example, the class “flowers” could includemany types of flowers, including “tulips,” “alstroemeria,” “roses,” andso on.

In some implementations, both entity-based aspects and class-basedaspects are used. Reliance on both entity-based aspects and class-basedaspects can result in a more robust set of aspects. For example, someentities are so rare that there will be a small amount of data to basethe aspects on. For these entities, relying on class based aspects canincrease the number of candidate aspects. However, some entities arevery popular and may have entity-specific aspects that can beidentified, for example, from user search histories. Therefore, alsoincluding entity-based aspects can be useful for these more popularentities.

In some implementations, generating a group of candidate aspects for theentity includes analyzing query data for queries including the entity.The query data can be analyzed, for example, to identify queryrefinements and query super-strings.

A query refinement occurs when a user first issues a query for theentity, and then follows that query with another related query. Forexample, if a user issues a query for “popcorn” followed by a query for“microwave popcorn,” microwave popcorn can be identified as a queryrefinement for popcorn. Query refinements do not have to include theoriginal query. For example, if a user issues a query for “computer”followed by a query for “laptop,” laptop can be identified as a queryrefinement for computer. Query refinements can provide valuableinformation about an entity, because they indicate how a given userchose to explore the search space for the entity.

Query refinements can be generated as follows. One or more user searchhistories including queries for the entity can identified. Each usersearch history is then divided into sessions, where each sessionrepresents a group of queries issued by a given user for a giveninformation finding task. A session can be measured in a number of waysincluding, for example, by a specified period of time (for example,thirty minutes), by a specified number of queries (for example, 15queries), until a specified period of inactivity (for example, tenminutes without performing a search), or while a user is logged-in to asearch system.

The sessions that do not include a query for the entity can be filteredout. The queries that follow a query for the entity in the remainingsessions are query refinements. Each of the query refinements indicatesa potential candidate aspect. For example, a candidate aspect can be thequery refinement itself, or the part of the query refinement that doesnot include the entity. Candidate aspects can also be identified byanalyzing the query refinement using linguistic analysis techniques, forexample, using dictionaries or statistical analysis to identify theterms in the query refinement that are most likely to be aspects, or bylooking the query refinement up in a database that associates queryrefinements with aspects. Potential candidate aspects can be aggregatedacross users, and candidate aspects that do not appear more than athreshold number of times can be filtered out.

In some implementations, query refinements are generated for a querybased on both the entity in the query and the entity's associatedproperties, instead of just the entity.

Generally speaking, a query is a super-string of another query when itincludes the other query. For example, “Vietnam travel package” is asuper-string of “Vietnam travel,” because it includes the text “Vietnamtravel.” Unlike query-refinements, a query super-string does not have tobe sent during the same session as the query for which it is asuper-string.

Query super-strings can be generated by considering one or more usersearch histories and identifying queries that include the entity. Eachquery super-string indicates a potential candidate aspect. For example,a candidate aspect can be the part of the query super-string that doesnot include the entity. In some implementations, the query super-stringis filtered to remove common words such as “a” and “the” before thecandidate aspect is identified. Candidate aspects can also be identifiedfrom the query super-string using linguistic techniques or a database,as described above. Potential candidate aspects can be aggregated acrossusers, and candidate aspects that do not appear more than a thresholdnumber of times can be filtered out.

In some implementations, query super strings are identified for queriesthat include text naming the entity and its properties, rather than justthe entity.

In some implementations, the system associates the entity with a classand generates class-based candidate aspects for the entity.

In some implementations, the system associates the entity with a classbased on a pre-defined database that associates entities with classes.This pre-defined database can be generated, for example, by analyzingknowledge base information (e.g., information from Wikipedia™, run bythe Wikimedia Foundation, or Freebase™, run by Metaweb Technologies).Generally speaking, a knowledge base is a collection of information forone or more entities. Knowledge bases can specify relationships betweenentities, such as class relationships, and can also specify features ofentities. For example, a knowledge base could specify that “Canada” isin a class called “country” and that one of its features is its “GDP.”Entity-class relationships can be identified from the knowledge baseinformation, and associations based on the relationships can be storedin the database for future use. The pre-defined database can also begenerated by querying the search system 114 for Hearst patterns, e.g.,if the entity is “Boston,” a query for “X such as Boston” can be issuedto the search system. The results can then be analyzed for sentencesincluding “such as Boston” and the resulting class can be identified.For example, if several of the search results included the phrase“cities such as Boston,” then Boston could be associated with a class of“city.” In some implementations, the entity does not have to be aperfect match with an entity in the database in order for an associationto be identified. For example, small differences such as whether theentity is singular or plural may be overlooked. For example, if thesingular “rose” was stored in the database, but the entity was “roses,”the class information for rose could be used. Other small differences,such as spelling variations may also be overlooked.

In some implementations, the system associates the entity with a classon the fly, for example, by accessing knowledge base information (e.g.,crawling a website such as Wikipedia™) and identifying a classassociated with the received entity, or issuing a query with a Hearstpattern including the entity. Other techniques for associating an entitywith a class are also possible. For example, the entity can beclassified based on machine learning techniques, such as support vectormachines. Alternatively, a user can specify the class that is associatedwith an entity.

Class-based aspects can be generated, for example, by analyzing querydata for queries including a class member other than the entity. Forexample, if the entity was “daffodils” and its class was “flowers,” thenquery data could be analyzed for queries including “roses,” because“roses” is one of the members of the flowers class. The query data forthe class member can be analyzed to identify aspects much as the querydata for the entity is analyzed to identify aspects, as described above.When the entity is associated with one or more properties, theseproperties can be included with each class member for purposes ofidentifying aspects. In some implementations, class-based aspects aregenerated only from class members that are sufficiently close to theentity, e.g., within a threshold of time or space or another measure ofdistance between entities. For example, “Canada” “Belgium” and “France”are all in the class “country”. However, Belgium and France areneighboring countries. Therefore, if the entity is “Belgium” the systemcan identify class-based aspects based on the class member “France” butnot the class member “Canada,” because Canada is too far away fromBelgium. The threshold can be a number of miles, or a number of days, orother measures of distance. The threshold can be determined empirically.

Other methods of generating candidate aspects are also possible, forexample, candidate aspects can be generated by analyzing knowledge baseinformation associated with the entity or its class members. Knowledgebases can provide binary relationships between a given entity and itsfeatures. For example, Wikipedia™ provides an “Infobox” for someentities. The Infobox for Cambodia lists features such as capital, flag,population, area, and GDP. These can provide additional aspects for theentity Cambodia. Candidate aspects can also be retrieved from a databaseassociating entities or class members with potential candidate aspects.

In some implementations, the candidate aspects are filtered based onuser feedback on aspects that had been previously associated withentities and presented to users. The user feedback can indicate whichaspects are useful aspects of an entity, and which aspects are notuseful aspects of an entity. The user feedback can be used to directlyfilter out aspects that users have indicated are not useful.Alternatively, the user feedback can be used as training inputs to traina machine to filter candidate aspects using machine learning techniques.

The system modifies the group of candidate aspects (step 206). Modifyingthe group of candidate aspects can include combining similar candidateaspects and grouping candidate aspects based on a class of one or morecandidate aspects. This combining and grouping reduces redundant aspectsand helps focus the aspects on various axes of search.

Often similar aspects are generated. For example, for the query “Vietnamtravel” the aspects “package” “packages” and “deal” could all begenerated. All of these aspects refer to the same basic concept—aproduct bundling various aspects of a trip into one package.Consequently, these aspects can be combined into a single aspect.

FIG. 3 illustrates an example of combining similar candidate aspects. Aninitial group of candidate aspects 302 contains four aspects: Aspect 1,Aspect 1′, Aspect 2, and Aspect 3.

A similarity score can be calculated for each pair of aspects in thegroup of candidate aspects 302. For example, Aspect 1 and Aspect 1′ havea similarity score 304 of 0.9. Aspect 1 and aspect 2 have a similarityscore 306 of 0.5, and Aspect 1′ and Aspect 2 have a similarity score 308of 0.3.

In some implementations, calculating the similarity score for twoaspects includes identifying a respective set of search resultscorresponding to a query for each aspect, and then comparing the searchresults. The search results can be generated by issuing a query to asearch engine (e.g., search engine 130 in FIG. 1) for each aspect. Thetop n search results for each query are then chosen as the set of searchresults for the respective aspect (where n can be any integer chosen togive a sufficient amount of information for comparison, (e.g., 8 or10)). For purposes of illustration, let D_(i) be the set of searchresults d_(i)εD_(i) that correspond to a first aspect, and let D_(j) bethe set of search results d_(j)εD_(j) that correspond to a second aspectbeing compared to the first aspect. The similarity score for the twosets of search results, and therefore the two aspects, can be calculatedas follows.

A feature vector is generated for each search result in D_(i) and D_(j).For example, a feature vector can include one or more features (e.g.,terms) and a corresponding statistical measure of the importance of thefeature to the user (e.g., a term frequency (tf) weight or a termfrequency inverse document (tf-idf) weight for each feature). The termscan be all words in the search result, or a subset of the words of thesearch result (for example, the title of the result and the snippetidentified by a search engine).

In some implementations, tf weights are used as statistical measures ofthe importance of a feature to the user. The tf weights can be usedbecause the importance of a feature to the user can increaseproportionally according to the frequency with which the feature occurs(e.g., a term frequency) in a collection of documents, for example, alldocuments indexed by the search system (e.g., search system 114 in FIG.1), or all documents indexed by the search system that are in the samelanguage as the term.

The term frequency in a search result is the relative frequency that aparticular term occurs in the search result, and can be represented as:

${{tf}_{q,p} = \frac{n_{q,p}}{\sum\limits_{k}\; n_{k,p}}},$

where the term frequency is a number n_(q,p) of occurrences of theparticular term t_(q) in a search result (d_(p)) divided by the numberof occurrences of all terms t_(k) in d_(p).

In some implementations, tf-idf weights are used as the statisticalmeasures of the importance of the features to the user. A tf-idf weightcan be calculated by multiplying a term frequency with an inversedocument frequency (idf).

The idf is an estimate of how frequently a term appears in a collectionof documents, for example, all documents indexed by the search system,or all documents indexed by the search system that are in the samelanguage as the term. The inverse document frequency can be representedas:

${{idf}_{q} = {\log \frac{D}{{{D_{p}\text{:}\mspace{14mu} t_{q}} \in d_{p}}}}},$

where the number D of all documents in the corpus of documents isdivided by a number D_(p) of documents d_(p) containing the term t_(q).In some implementations, the Napierian logarithm is used instead of thelogarithm of base 10.

A tdf idf weight can be represented as:

tf _(—) idf _(q,p) =tf _(q,p) ·idf _(q,p).

A similarity score is calculated for each pair of search results{d_(i),d_(j)}. The similarity score for each pair can be calculated bydetermining the distance between the feature vectors for the tworesults. For example, if the a search result di has a feature vector ofX=(x₁, x₂, x₃) and a search result d_(j) has a feature vector of Y=(y₁,y₂, y₃), sim(d_(i), d_(j)) and a search result d_(j) has a featurevector of Y=(y₁, y₂, y₃), sim(d_(i), d_(j)) can be represented as acosine distance:

${{sim}\left( {d_{i},d_{j}} \right)} = {{{cosine}\mspace{14mu} {distance}} = {\frac{X \cdot Y}{{X} \cdot {Y}} = {\frac{{x_{1} \cdot y_{1}} + {x_{2} \cdot y_{2}} + {x_{3} \cdot y_{3}}}{\sqrt{x_{1}^{2} + x_{2}^{2} + x_{3}^{2}} \cdot \sqrt{y_{1}^{2} + y_{2}^{2} + y_{3}^{2}}}.}}}$

The similarity score for the two sets of search results, D_(i) andD_(j), as a whole can be calculated based on the similarity scoresbetween their individual search documents. In some implementations, thesimilarities for each pair of search results is averaged. In someimplementations, the average of the highest similarity scores for eachsearch result is used as follows:

${{{sim}\left( {D_{i},D_{j}} \right)} = {\frac{\Sigma_{i}{{sim}\left( {d_{i},D_{j}} \right)}}{2{D_{i}}} + \frac{\Sigma_{j}{{sim}\left( {d_{j},D_{i}} \right)}}{2{D_{j}}}}},$

where

sim(d_(i),D_(j))=max_(k)sim(d_(i),d_(k)) andsim(d_(j),D_(i))=max_(k)sim(d_(k),d_(j)),

and where max_(k)sim(d_(i),d_(k)) is the maximum similarity score of thesimilarity scores between the search result d_(i) and all search resultsin D_(j), and max_(k)sim(d_(k),d_(j)) is the maximum similarity score ofthe similarity scores between the search result d and all search resultsin D_(i).

Other similarity measures can also be used, for example, determining asingle feature vector for all search results for each aspect andcalculating the similarity scores based on the similarity of the twofeature vectors, e.g., based on the cosine distance.

Alternatively, the similarity score for two aspects can be calculated bycomparing the paths (e.g., web addresses, file paths) of the searchresults for each aspect, for example, by parsing the text of the pathsand extracting features, such as a domain name or directory in a filesystem, and then comparing the extracted features. The similarity scorefor two aspects can also be calculated by comparing the text of theaspects themselves, for example, by comparing the characters in the textof the two aspects.

Once the similarity scores for each pair of aspects are identified, thesimilarity scores can be used to identify candidate aspects that shouldbe combined into a single aspect. Various clustering techniques can beused to determine when two candidate aspects should be combined. Forexample, a graph partition algorithm can be used. The graph partitionalgorithm creates a graph where the nodes of the graph are the aspectsand an edge connects two nodes if they are sufficiently similar (e.g.,if their similarity score exceeds a threshold). For example, in FIG. 3,there is an edge (indicated by a solid line) between Aspect 1 and Aspect1′, because the similarity score between Aspect 1 and Aspect 1′ isgreater than the threshold value. However, there are no other connectededges in the graph. The threshold value can be determined empirically,for example, based on a set of test aspects. The graph partitionalgorithm then combines aspects that are connected into a single aspect.For example, in FIG. 3, the resulting set of aspects 316 lists onlyAspect 1, Aspect 2, and Aspect 3. Aspect 1′ has been combined withAspect 1.

Combining two aspects can include keeping one aspect in the group ofaspects and removing the other one from the group of aspects. Thedecision of which aspect to keep can be made, for example, by selectingthe aspect with the highest popularity score. Aspect popularity scoresare discussed in more detail below.

Other clustering techniques can be used, for example, k-means clustering(where aspects are divided into a pre-defined number of clusters basedon the similarity scores), spectral clustering, hierarchical clustering,and star-clustering.

The candidate aspects can be grouped based on their classes. Aspectclasses can be determined much as entity classes are determined, forexample, as described above. In some implementations, determining anaspect class includes determining a synonym for the aspect, and thendetermining the synonym's class. For example, “New York University” isfrequently abbreviated as “NYU.” However, it may be difficult todetermine an aspect class for “NYU,” for example, because many knowledgebases only classify one of the possible names for a given entity.Therefore, there may be no data on which to base a classification of“NYU.” However, the more formal “New York University” is more likely tobe included in knowledge bases. Therefore, a class for “NYU” can bedetermined by associating “NYU” with its synonym “New York University”and then identifying a class for the synonym. Synonyms can bedetermined, for example, by looking the aspect up in a thesaurus or adictionary. Synonyms can also be determined, for example, by usingredirect web pages of a knowledge base such as Wikipedia™. The redirectpages indicate the mapping of various terms to a synonym that isclassified by Wikipedia™.

Aspects can be different from a similarity score perspective but stillrelated in the sense that they belong to the same class. When thisoccurs, the aspects can be grouped into the same class. For example, theaspects “New York,” “San Francisco,” and “Washington DC” are differentbecause they point to different cities with different food, culture,streets, etc., yet can all be associated with the class “U.S. cities.”Thus, the aspects can be grouped into the class “U.S. cities.” In someimplementations, aspects are grouped into a sub-class of their class.For example, “New York” and “Washington DC” are members of the class“U.S. cities” and the sub-class “East coast cities.” Therefore, theycould alternatively be grouped together into “East coast cities.”

FIG. 4 illustrates an example of grouping aspects based on their aspectclasses. A group of aspects 402 is each associated with a respectiveclass. Aspect 1 and Aspect 3 are both in Class 1, while Aspect 2 is inClass 2. When the aspects are grouped based on their class, the newgroup of aspects 404 includes Aspect 2 and Class 1. Aspect 2 remainsunchanged in the new group of aspects 404, because its class did notmatch the class of any other aspects. Aspect 1 and Aspect 3 werecombined into a new aspect equal to their class, Class 1, because theyhad the same class.

In some implementations, some aspects are associated with multipleclasses. Determining a class for these ambiguous aspects can beproblematic. For example, imagine an entity “Vietnam” and two aspects“food” and “history.” Both of these aspects are ambiguous. In additionto referring to something you can eat, “food” could refer to the“F.O.O.D.” music album. In addition to referring to something in thepast, “history” could refer to the “HIStory: Past, Present and Future,Book 1” music album. Thus, the two ambiguous aspects could be classifiedas “album,” and then grouped together into an “album” aspect. Food andhistory are two distinct aspects for exploring Vietnam, and there isvalue in keeping them separate. Therefore, they should not be groupedtogether. In some implementations, ambiguous aspects are not grouped, inorder to avoid this potential problem.

Ambiguous aspects can be identified, for example, by using adisambiguation database that identifies aspects with multiple meaningsAmbiguous aspects can also be identified, for example, by usingdisambiguation web pages of a website such as Wikipedia™. Thesedisambiguation pages identify multiple meanings for a given aspect.

In some implementations, once the modified group of candidate aspects isdetermined, the group is filtered, for example, to remove potentiallyoffensive aspects (e.g., porn filtering). This filtering can be done bycomparing the aspects to a list of potentially offensive aspects, andremoving any aspects that are on the list.

As shown in FIG. 2, the system ranks one or more of the candidateaspects for the entity (step 280). The candidate aspects are rankedbased on a diversity score and a popularity score of each aspect. Thegoal of the ranking is to identify aspects that are both interesting tothe user and diverse enough to give a user choices on where to nextdirect his or her search. Ranking can be performed as follows.

The highest ranked aspect is the aspect with a highest popularity score.The popularity score is a measure of how common the aspect is.Popularity scores can be calculated in various ways depending on how theaspect was generated.

When the aspect was generated as a query refinement, the popularityscore can be based on the frequency with which the query refinementappears, for example, by taking the total number of sessions that thequery refinement appears in and dividing by the total number ofsessions.

For example, a popularity score p_(r)(q_(j)|q) of a refinement q_(j) ofa query q can be calculated as follows:

${{p_{r}\left( q_{j} \middle| q \right)} = \frac{{fq}\left( q_{j} \right)}{\Sigma_{j}{{fq}\left( q_{j} \right)}}},$

where fq(q_(j)) is the frequency with which the query refinement q_(j)appears in the user search histories.

When the aspect was generated as a query super-string, the popularityscore can be based on the frequency with which the query super-stringappears in the user search histories, for example, by taking the totalnumber of times the super-string appears in the search histories anddividing that by the total number of query super-strings in the usersearch history plus the total number of times that a query for theentity appears in the user search histories.

For example, the popularity score p_(ss)(q_(j)|q) for a given querysuper string q_(j) can be calculated as follows:

${{p_{ss}\left( q_{j} \middle| q \right)} = \frac{{fq}\left( q_{j} \right)}{{{fq}(q)} + {\Sigma_{j}{{fq}\left( q_{j} \right)}}}},$

where fq(q_(j)) is the frequency with which the super-string query q_(j)appears in the search histories, and fq(q) is the frequency with whichthe query for the entity appears in the search histories.

The popularity score can also be calculated by dividing the total numberof times the super-string appears in the search histories by the totalnumber of query super-strings in the search histories, for example:

${p_{ss}\left( q_{j} \middle| q \right)} = \frac{{fq}\left( q_{j} \right)}{\Sigma_{j}{{fq}\left( q_{j} \right)}}$

When a query refinement and a query super-string are both identified ascandidate aspects, the two can be combined into a single aspect. Thepopularity score for that aspect can be determined in a number of ways,including, for example, taking the higher of the two scores, taking theaverage of the two scores, or taking the lower of the two scores.

For example, the score p_(inst)(q_(j)|q) for an aspect associated withgiven query q_(j) which is both a query refinement and a querysuper-string can be calculated as follows:

p _(inst)(q _(j) |q)=max(p(q _(j) |q),p _(ss)(q _(j) |q)).

When an aspect is identified by analyzing query log data for other classmember entities in the same class as the entity, the popularity scorecan be generated for the aspect as described above, e.g.,

p _(inst)(a _(i) |q)=max(p(a _(i) |q),p _(ss)(a _(i) |q))

The popularity score for class-based aspects can be adjusted so thataspects associated with the class do not overwhelm aspects associatedwith the specific entity. Rarer entities require the class-based aspectsin order to have a sufficient number and variety of aspects. However,more popular entities may have entity-based aspects that are moreimportant than the class-based aspects. A balance can be struck byweighting the scores of the aspects.

For example, a candidate aspect a_(i), of a query q which contains anentity of class C can be assigned a weighted score p(a_(i)|q) asfollows:

${{p\left( a_{i} \middle| q \right)} = \frac{{p_{inst}\left( a_{i} \middle| q \right)} + {K \times {p_{class}\left( a_{i} \middle| C \right)}}}{{\Sigma_{j}{p_{inst}\left( a_{j} \middle| q \right)}} + {K \times \Sigma_{j}{p_{class}\left( a_{j} \middle| C \right)}}}},$

where K is a design parameter controlling the relative importance of theindividual score of the aspect and the class score and can be determinedempirically and

${{p_{class}\left( a \middle| C \right)} = \frac{{Count}(a)}{C}},$

where count(a) is the number of queries in the query log that includedthe aspect a, and |C| is the number of entities in the class C.

The popularity score for a class-based candidate aspect can also reflecthow close the entity is to the class member the aspect is based on,e.g., from a time or space or other perspective. For example, if theentity is “November,” an aspect based on the class member “December”might have a better score than an aspect based on the class member“May,” because November is closer to December than May in the order ofmonths. As another example, if the entity is “San Francisco,” an aspectbased on “Los Angeles” might have a better score than an aspect based on“New York,” because San Francisco is closer to Los Angeles than New Yorkfrom a distance perspective.

Other popularity scores are also envisioned. For example, the popularityscore can be based on the click through rate for a given aspect, e.g.,the number of times users selected a search result after issuing a queryfor the aspect (or the entity and the aspect), divided by the totalnumber of times users issued queries for the aspect. The popularityscore can also be based on the dwell time associated with one or more ofthe search results corresponding to a query for the aspect or the aspectand the entity. Dwell time is the amount of time a user spends viewing asearch result. Dwell time can be a continuous number, such as the numberof seconds a user spends viewing a search result, or it can be adiscrete interval, for example “short clicks” corresponding to clicks ofless than thirty seconds, “medium clicks” corresponding to clicks ofmore than thirty seconds but less than one minute, and “long clicks”corresponding to clicks of more than one minute. In someimplementations, a longer dwell time of one or more results isassociated with a higher popularity score. The score is higher becauseusers found the results with a longer dwell time useful enough to viewfor a longer period of time.

Once the first aspect is ranked, subsequent aspects are ranked based ontheir popularity scores and a diversity score, e.g., a measure of howsimilar they are to the already ranked aspects. The diversity score foran un-ranked aspect can be generated, for example, by calculating asimilarity score between the un-ranked aspect and each ranked aspect,and then taking the minimum, maximum, or average of the scores.

FIG. 5 illustrates an example of ranking an unranked aspect 502, given apre-existing group of one or more ranked aspects 508.

A popularity score 506 is generated for the unranked aspect 502 using apopularity score generator 504. The popularity score generator generatesa popularity score for the aspect, for example, as described above. Adiversity score 512 is then generated for the unranked aspect 502 by thediversity score generator 510. The diversity score 512 is an estimate ofhow similar the unranked aspect 502 is to the ranked aspects 508. Thediversity score between the unranked aspect 502 and the set of rankedaspects 508 can be determined by calculating the similarity scorebetween the unranked aspect 502 and each ranked aspect in the set 508,for example as described above, and then using the minimum, maximum,average, or sum of the scores as the diversity score.

Once the popularity score 506 and the diversity score 512 are generated,they are passed to an overall score generator 514. The overall scoregenerator 514 generates an overall score 516 based on the popularityscore 506 and the diversity score 612, for example, by dividing thepopularity score 506 by the diversity score 512.

Other methods of ranking the candidate aspects are also envisioned. Forexample, the highest ranked candidate aspect can be chosen based on thepopularity score, and all subsequent aspects can be chosen based on thediversity score (for example, by choosing the aspect with the lowestdiversity score). The candidate aspects can also be ranked based just ontheir popularity scores or just on their diversity scores.

Returning to FIG. 2, the system then associates a number of the highestranked candidate aspects with the entity, or the entity and itsproperties (step 210). Any number of candidate aspects can be associatedwith the entity (and its properties), based on the needs of the systemand the storage capabilities of the system. For example, if the systemwill present the aspects to the user in a graphical environment whereonly a few aspects can be displayed at a time, the number of aspects maybe small. In contrast, if the system might provide a large number ofaspects to a user or process, the number of candidate aspects may belarger.

Once the number of highest ranked candidate aspects are associated withthe entity (and its properties), the association is stored in a locationaccessible to the system, for example, in a database that associates agiven entity with its aspects.

FIG. 6 illustrates an example method 600 for receiving a query includingone or more terms corresponding to an entity and presenting searchresults based on the identified aspects of the entity. For convenience,the example method 600 will be described with reference to a system(e.g., the search system 114 of FIG. 1 or another system) that performsthe method 600. The method can be performed in conjunction with themethod described above in reference to FIG. 2.

The system receives a query including one or more terms corresponding toan entity (step 602). The query can be received, for example, from auser or from the search system 114. In some implementations, the systemand the search system 114 are the same system.

The system identifies aspects associated with the entity (step 604). Insome implementations, the query includes an entity and its properties,and the system can identify aspects associated with the entity and itsproperties. For example, if the query is “Hawaii vacation” then “Hawaii”could be identified as the entity, and “vacation” could be identified asa property of the entity “Hawaii.” The aspects can be identified asdescribed above in reference to FIG. 2, or can be retrieved, forexample, from a database including ranked aspects generated using themethod described above in reference to FIG. 2. The system can identifyall aspects associated with the entity. When the aspects are ranked, thesystem can alternatively identify a top k number of the ranked aspects,where k is the number of aspects that are going to be presented to theuser.

The system receives one or more sets of search results (step 606). Eachset of search results corresponds to an entity and one of the identifiedaspects. For example, if the entity was “Hawaii” and the identifiedaspects were “beaches,” “hotels,” “weather,” and “food,” separate setsof search results could be received for “Hawaii beaches,” “Hawaiihotels,” “Hawaii weather,” and “Hawaii food.” The search results can bereceived in response to a query issued to the search engine 130 for theentity and an aspect.

The system presents the search results based on the identified aspects(step 608). In some implementations, the search results are presented ina “mashup,” where relevant results and other information for one or moreof the aspects are presented in one display, organized according toaspect.

FIG. 7 illustrates an example mashup displayed after a user submits asearch query 702 for “mount bachelor” by clicking on the search button704. Search results and other information corresponding to aspects forMount Bachelor (e.g., “weather,” “hotels” “community college” and“mountains”) are labeled in accordance with the aspect and presented tothe user in the boxes 706, 708, 710, and 712. The presentation ofinformation can be tailored to the aspect. For example, a ski and snowreport is presented in box 706 for users interested in the “weather”aspect. Search results corresponding to “hotels” are presented in box708, search results corresponding to “community college” are presentedin box 710, and search results corresponding to “mountains” arepresented in box 712.

As FIG. 7 illustrates, all search results for a given aspect are notnecessarily presented for that aspect. For example, more search resultsfor the “hotels” aspect than the two search results that are presentedcan be received. The search results that are presented are chosen fromthe search results that are received, for example, by taking a topnumber of search results based on a ranking of the search results (e.g.,a ranking provided by the search system 114). The number can bedetermined, for example, based on the number of aspects for the entityand/or the space available for presentation of the search results.Search results do not have to be presented for all identified aspects.

In some implementations, a summary of the entity in accordance with oneof the aspects is presented. A summary of an entity in accordance withan aspect is a direct presentation of information that is availablethrough search results corresponding to the entity and the aspect. Forexample, the ski and snow report presented in box 706 is a summary ofinformation for the entity “mount bachelor” and the aspect “weather.” Auser interested in the “weather” aspect is likely interested in knowingthe current weather, so rather than requiring the user to click on asearch result to see weather information, the system can insteaddirectly present information on the weather. As another example, if theentity is “University of Southern California football team” and theaspect is “season record,” a summary of the team's season record can bepresented. As yet another example, if the entity is a particular movie,and the aspect is movie reviews, then multiple reviews can be presentedside by side. In some implementations, the summary is associated with anaspect and an entity in advance and stored, for example, in a database.The system can then retrieve the summary when needed.

Other methods of presenting the search results based on the aspects arealso envisioned. For example, the system can create a separate web pagefor the search results corresponding to each aspect. Links to the webpages corresponding to the identified aspects can be presented alongwith search results for the original query. Alternatively, the links tothe web pages can be presented as a separate web page. The system canpresent the aspects as “related search” options for the user, and thenpresent the search results corresponding to a given aspect once a userselects the aspect.

In some implementations, the query includes terms corresponding tomultiple entities. When the query includes multiple entities, the systemcan identify the aspects associated with each query, and then combinethe identified aspects based on their rank (e.g., based on a popularityscore and a diversity score of each aspect). Search results for the topranked aspects can then be received and presented to the user.Alternatively, the system can present search results for the aspectscorresponding to each of the entities separately.

In some implementations, the system receives search resultscorresponding to the entity, rather than the entity and an aspect (forexample, from the search system 114). In these implementations, thesearch results can be grouped based on the aspects, for example, bysorting the search results based on the aspects, or using clusteringtechniques to cluster search results around the aspects. In theseimplementations, the search results can be presented based on theaspects as described above.

FIG. 8 illustrates an example architecture of a system 800. The systemgenerally includes a data processing apparatus 802 and a user device828. The data processing apparatus 802 and user device 828 are connectedthrough a network 826. In some implementations, the user device 828 andthe data processing apparatus 802 are the same device.

While the data processing apparatus in 802 is shown as a single dataprocessing apparatus, a plurality of data processing apparatus may beused. The data processing apparatus 802 runs a number of modules, forexample, processes, e.g. executable software programs. In variousimplementations, these processes include an entity-class associator 804,aspect generator 806, aspect combiner 808, aspect grouper 810, aspectranker 812, and aspect associator 814.

The entity-class associator 804 associates a given entity with a class,for example, based on a pre-defined database that associates entitieswith classes or by accessing knowledgebase information for the entity.

The aspect generator 806 generates aspects for a given entity, forexample, as described above in reference to FIG. 2, by analyzing usersearch histories to identify query refinements and query superstringsfor the entity, its class members, or the entity and its class members.

The aspect combiner 808 combines aspects, for example, as describedabove in reference to FIGS. 2 and 3, based on their similarity scores.The aspect combiner 808 may also calculate similarity scores for pairsof aspects as described above in reference to FIGS. 2 and 3.

The aspect grouper 810 groups aspects based on their class, for example,as described above in reference to FIGS. 2 and 4. In someimplementations, the aspect combiner 808 and the aspect grouper 810 arethe same process.

The aspect ranker 812 ranks aspects based on a popularity score and adiversity score of each aspect, for example, as described above inreference to FIGS. 2 and 5.

The aspect associator 814 associates one or more aspects with a givenentity or a given entity and its properties, for example, as describedabove in reference to FIG. 2.

In some implementations, the data processing apparatus 802 stores one ormore of an entity-class database associating a given entity with itsclass, an aspect-class database associating a given aspect with itsclass, user search histories, and an entity-aspect database associatinga given entity with one or more aspects. In some implementations, theentity-class database and the aspect-class database are the samedatabase. In some implementations, the data is stored on a computerreadable medium 820. In some implementations, the data is stored on theadditional device(s) 818.

The data processing apparatus 802 may also have hardware or firmwaredevices including one or more processors 816, one or more additionaldevices 818, computer readable medium 820, a communication interface822, and one or more user interface devices 824. The processor(s) 816are capable of processing instructions for execution. In oneimplementation, at least one of the processor(s) 816 is asingle-threaded processor. In another implementation, at least one ofthe processor(s) 816 is a multi-threaded processor. The processor(s) 816are capable of processing instructions stored in memory or on a storagedevice to display graphical information for a user interface on the userinterface device(s) 824. User interface device(s) 824 can include, forexample, a display, a camera, a speaker, a microphone, or a tactilefeedback device.

The data processing apparatus 802 communicates with user device 828using its communication interface 822.

The user device 828 can be any data processing apparatus, for example, auser's computer. A user uses the user device 828 to submit searchqueries through the network 826 to the data processing apparatus 802 andreceive search results from the data processing apparatus 802, forexample, through a web-browser run on the user device, for example,Firefox™, available from the Mozilla Project in Mountain View, Calif.The user device 828 may present the search results to the user, forexample, by displaying the results on a display device, transmittingsound corresponding to the results, or providing tactile feedbackcorresponding to the results. The search results may be organizedaccording to aspects associated with the entity. When a user uses his orher computer to select a search result to view, information regardingthe user selection can be sent to the data processing apparatus 802 andused to generate user search history data.

In some implementations, the user device 828 runs one or more of themodules 804, 806, 808, 810, 812, and 814 instead of or in addition tothe data processing apparatus 802 running the modules.

While the system 800 of FIG. 8 envisions a user who submits a searchquery through his or her computer, the search query does not have to bereceived from a user or a user's computer, but can be received from anydata processing apparatus, process, or person, for example a computer ora process run on a computer, with or without direct user input.Similarly, the results and aspects do not have to be presented to theuser's computer but can be presented to any data processing apparatus,process, or person. The user search histories can be received from apopulation of users, and not necessarily from the same user device 828used to receive search results organized based on aspects of an entityin the search query.

Embodiments of the subject matter and the operations described in thisspecification can be implemented in digital electronic circuitry, or incomputer software, firmware, or hardware, including the structuresdisclosed in this specification and their structural equivalents, or incombinations of one or more of them. Embodiments of the subject matterdescribed in this specification can be implemented as one or morecomputer programs, i.e., one or more modules of computer programinstructions, encoded on a computer storage media for execution by, orto control the operation of, data processing apparatus.

Alternatively or in addition, the program instructions can be encoded onan artificially generated propagated signal, e.g., a machine-generatedelectrical, optical, or electromagnetic signal, that is generated toencode information for transmission to suitable receiver apparatus forexecution by a data processing apparatus. The computer storage mediumcan be, or be included in, a computer-readable storage device, acomputer-readable storage substrate, a random or serial access memoryarray or device, or a combination of one or more of them.

The operations described in this specification can be implemented asoperations performed by a data processing apparatus on data stored onone or more computer-readable storage devices or received from othersources.

The term “data processing apparatus” encompasses all kinds of apparatus,devices, and machines for processing data, including by way of example aprogrammable processor, a computer, a system on a chip, or combinationsof them. The apparatus can include special purpose logic circuitry,e.g., an FPGA (field programmable gate array) or an ASIC (applicationspecific integrated circuit). The apparatus can also include, inaddition to hardware, code that creates an execution environment for thecomputer program in question, e.g., code that constitutes processorfirmware, a protocol stack, a database management system, an operatingsystem, a cross-platform runtime environment, e.g., a virtual machine,or a combination of one or more of them. The apparatus and executionenvironment can realize various different computing modelinfrastructures, such as web services, distributed computing and gridcomputing infrastructures.

A computer program (also known as a program, software, softwareapplication, script, or code) can be written in any form of programminglanguage, including compiled or interpreted languages, and it can bedeployed in any form, including as a stand-alone program or as a module,component, subroutine, or other unit suitable for use in a computingenvironment. A computer program does not necessarily correspond to afile in a file system. A program can be stored in a portion of a filethat holds other programs or data (e.g., one or more scripts stored in amarkup language document), in a single file dedicated to the program inquestion, or in multiple coordinated files (e.g., files that store oneor more modules, sub-programs, or portions of code). A computer programcan be deployed to be executed on one computer or on multiple computersthat are located at one site or distributed across multiple sites andinterconnected by a communication network.

The processes and logic flows described in this specification can beperformed by one or more programmable processors executing one or morecomputer programs to perform functions by operating on input data andgenerating output. The processes and logic flows can also be performedby, and apparatus can also be implemented as, special purpose logiccircuitry, e.g., an FPGA (field programmable gate array) or an ASIC(application-specific integrated circuit).

Processors suitable for the execution of a computer program include, byway of example, both general and special purpose microprocessors, andany one or more processors of any kind of digital computer. Generally, aprocessor will receive instructions and data from a read-only memory ora random access memory or both. The essential elements of a computer area processor for performing instructions and one or more memory devicesfor storing instructions and data. Generally, a computer will alsoinclude, or be operatively coupled to receive data from or transfer datato, or both, one or more mass storage devices for storing data, e.g.,magnetic, magneto-optical disks, or optical disks. However, a computerneed not have such devices. Moreover, a computer can be embedded inanother device, e.g., a mobile telephone, a personal digital assistant(PDA), a mobile audio player, a Global Positioning System (GPS)receiver, to name just a few. Computer-readable media suitable forstoring computer program instructions and data include all forms ofnon-volatile memory, media and memory devices, including by way ofexample semiconductor memory devices, e.g., EPROM, EEPROM, and flashmemory devices; magnetic disks, e.g., internal hard disks or removabledisks; magneto-optical disks; and CD-ROM and DVD-ROM disks. Theprocessor and the memory can be supplemented by, or incorporated in,special purpose logic circuitry.

To provide for interaction with a user, embodiments of the subjectmatter described in this specification can be implemented on a computerhaving a display device, e.g., a CRT (cathode ray tube) or LCD (liquidcrystal display) monitor, for displaying information to the user and akeyboard and a pointing device, e.g., a mouse or a trackball, by whichthe user can provide input to the computer. Other kinds of devices canbe used to provide for interaction with a user as well; for example,feedback provided to the user can be any form of sensory feedback, e.g.,visual feedback, auditory feedback, or tactile feedback; and input fromthe user can be received in any form, including acoustic, speech, ortactile input.

Embodiments of the subject matter described in this specification can beimplemented in a computing system that includes a back-end component,e.g., as a data server, or that includes a middleware component, e.g.,an application server, or that includes a front-end component, e.g., aclient computer having a graphical user interface or a Web browserthrough which a user can interact with an implementation of the subjectmatter described is this specification, or any combination of one ormore such back-end, middleware, or front-end components. The componentsof the system can be interconnected by any form or medium of digitaldata communication, e.g., a communication network. Examples ofcommunication networks include a local area network (“LAN”) and a widearea network (“WAN”), e.g., the Internet.

The computing system can include clients and servers. A client andserver are generally remote from each other and typically interactthrough a communication network. The relationship of client and serverarises by virtue of computer programs running on the respectivecomputers and having a client-server relationship to each other.

While this specification contains many specifics, these should not beconstrued as limitations on the scope of the invention or of what may beclaimed, but rather as descriptions of features specific to particularembodiments of the invention. Certain features that are described inthis specification in the context of separate embodiments can also beimplemented in combination in a single embodiment. Conversely, variousfeatures that are described in the context of a single embodiment canalso be implemented in multiple embodiments separately or in anysuitable subcombination. Moreover, although features may be describedabove as acting in certain combinations and even initially claimed assuch, one or more features from a claimed combination can in some casesbe excised from the combination, and the claimed combination may bedirected to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particularorder, this should not be understood as requiring that such operationsbe performed in the particular order shown or in sequential order, orthat all illustrated operations be performed, to achieve desirableresults. In certain circumstances, multitasking and parallel processingmay be advantageous. Moreover, the separation of various systemcomponents in the embodiments described above should not be understoodas requiring such separation in all embodiments, and it should beunderstood that the described program components and systems cangenerally be integrated together in a single software product orpackaged into multiple software products.

Thus, particular embodiments of the invention have been described. Otherembodiments are within the scope of the following claims. For example,the actions recited in the claims can be performed in a different orderand still achieve desirable results.

What is claimed is:
 1. A method comprising: receiving a query in acomputer system, the computer system comprising one or more computersand the query including one or more terms corresponding to an entity;identifying, by the computer system, a plurality of aspects associatedwith the entity in at least one database; identifying, by the computersystem, a plurality of search results, the search results including afirst set of the search results based on the entity and a first aspectof the aspects and a second set of the search results based on theentity and a second aspect of the aspects; and providing, in response tothe query, a presentation of the search results in one display, thepresentation including a plurality of visually distinct aspect areaswith each of the aspect areas being for a corresponding one of theaspects and including a corresponding label, wherein providing thepresentation of the search results comprises: presenting at least one ofthe search results of the first set in a first aspect area of the aspectareas, the first aspect area corresponding to the first aspect; andpresenting at least one of the search results of the second set in asecond aspect area of the aspect areas, the second aspect areacorresponding to the second aspect.
 2. The method of claim 1, wherein afirst search result of the first search results is provided in the firstaspect area and wherein the first search result is a summary ofinformation about the entity in accordance with the first aspect.
 3. Themethod of claim 1, wherein the search results further include a thirdset of search results responsive to the query and providing thepresentation of the search results further comprises presenting at leastsome of the third set of search results in the one display.
 4. Themethod of claim 3, wherein identifying the first set of search resultscomprises receiving the first set of search results in response toissuing a first aspect query that is based on the entity and the firstaspect and wherein identifying the second set of search resultscomprises receiving the second set of search results in response toissuing a second aspect query that is based on the entity and the secondaspect.
 5. The method of claim 4, wherein the first aspect queryincludes the one or more terms corresponding to the entity and at leastone first aspect term corresponding to the first aspect and wherein thesecond aspect query includes the one or more terms corresponding to theentity and at least second aspect term corresponding to the secondaspect.
 6. The method of claim 1, wherein the search results of thefirst set that are presented in the first aspect area include a firstsearch result that includes a first link to a first web page.
 7. Themethod of claim 1, further comprising: generating the association of theplurality of aspects to the entity in the database.
 8. The method ofclaim 7, wherein generating the association of the plurality of aspectsto the entity in the database comprises: generating a group of candidateaspects for the entity; for each pair of one or more pairs of candidateaspects, calculating a similarity score for the pair based onidentifying respective aspect sets of search results corresponding torespective queries of candidate aspects in the pair of candidate aspectsand comparing the aspect sets of search results; modifying the group ofcandidate aspects to generate a group of modified candidate aspectsbased on the similarity score for the candidate aspects; and selectingone or more of the modified candidate aspects as the aspects toassociate with the entity in the database.
 9. The method of claim 8,wherein selecting one or more of the modified candidate aspects as theaspects to associate with the entity in the database comprises: rankingthe modified candidate aspects based on a diversity score and apopularity score; and selecting the one or more of the modifiedcandidate aspects as the aspects to associate with the entity in thedatabase based on the ranking.
 10. A system comprising: one or moreprocessors; and a computer storage medium including instructions, which,when executed by the processors, cause the processors to performoperations comprising: receiving a query in a computer system, thecomputer system comprising one or more computers and the query includingone or more terms corresponding to an entity; identifying, by thecomputer system, a plurality of aspects associated with the entity in atleast one database; identifying, by the computer system, a plurality ofsearch results, the search results including a first set of the searchresults based on the entity and a first aspect of the aspects and asecond set of the search results based on the entity and a second aspectof the aspects; and providing, in response to the query, a presentationof the search results in one display, the presentation including aplurality of visually distinct aspect areas with each of the aspectareas being for a corresponding one of the aspects and including acorresponding label, wherein providing the presentation of the searchresults comprises: presenting at least one of the search results of thefirst set in a first aspect area of the aspect areas, the first aspectarea corresponding to the first aspect; and presenting at least one ofthe search results of the second set in a second aspect area of theaspect areas, the second aspect area corresponding to the second aspect.11. The system of claim 10, wherein a first search result of the firstsearch results is provided in the first aspect area and wherein thefirst search result is a summary of information about the entity inaccordance with the first aspect.
 12. The system of claim 10, whereinthe search results further include a third set of search resultsresponsive to the query and providing the presentation of the searchresults further comprises presenting at least some of the third set ofsearch results in the one display.
 13. The system of claim 12, whereinidentifying the first set of search results comprises receiving thefirst set of search results in response to issuing a first aspect querythat is based on the entity and the first aspect and wherein identifyingthe second set of search results comprises receiving the second set ofsearch results in response to issuing a second aspect query that isbased on the entity and the second aspect.
 14. The system of claim 13,wherein the first aspect query includes the one or more termscorresponding to the entity and at least one first aspect termcorresponding to the first aspect and wherein the second aspect queryincludes the one or more terms corresponding to the entity and at leastsecond aspect term corresponding to the second aspect.
 15. The system ofclaim 10, wherein the search results of the first set that are presentedin the first aspect area include a first search result that includes afirst link to a first web page.
 16. The system of claim 10, wherein theinstructions further include instructions that, when executed by theprocessors, cause the processors to perform an operation comprising:generating the association of the plurality of aspects to the entity inthe database.
 17. The system of claim 16, wherein generating theassociation of the plurality of aspects to the entity in the databasecomprises: generating a group of candidate aspects for the entity; foreach pair of one or more pairs of candidate aspects, calculating asimilarity score for the pair based on identifying respective aspectsets of search results corresponding to respective queries of candidateaspects in the pair of candidate aspects and comparing the aspect setsof search results; modifying the group of candidate aspects to generatea group of modified candidate aspects based on the similarity score forthe candidate aspects; and selecting one or more of the modifiedcandidate aspects as the aspects to associate with the entity in thedatabase.
 18. The system of claim 17, wherein selecting one or more ofthe modified candidate aspects as the aspects to associate with theentity in the database comprises: ranking the modified candidate aspectsbased on a diversity score and a popularity score; and selecting the oneor more of the modified candidate aspects as the aspects to associatewith the entity in the database based on the ranking.
 19. Anon-transitory computer storage device comprising instructions that whenexecuted by an apparatus cause the apparatus to perform operationscomprising: receiving a query in a computer system, the computer systemcomprising one or more computers and the query including one or moreterms corresponding to an entity; identifying, by the computer system, aplurality of aspects associated with the entity in at least onedatabase; identifying, by the computer system, a plurality of searchresults, the search results including a first set of the search resultsbased on the entity and a first aspect of the aspects and a second setof the search results based on the entity and a second aspect of theaspects; and providing, in response to the query, a presentation of thesearch results in one display, the presentation including a plurality ofvisually distinct aspect areas with each of the aspect areas being for acorresponding one of the aspects and including a corresponding label,wherein providing the presentation of the search results comprises:presenting at least one of the search results of the first set in afirst aspect area of the aspect areas, the first aspect areacorresponding to the first aspect; and presenting at least one of thesearch results of the second set in a second aspect area of the aspectareas, the second aspect area corresponding to the second aspect